Speaker compensation with sine-log all-pass transforms

نویسندگان

  • John W. McDonough
  • Florian Metze
  • Hagen Soltau
  • Alexander H. Waibel
چکیده

In recent work, we proposed the rational all-pass transform (RAPT) as the basis of a speaker adaptation scheme intended for use with a large vocabulary speech recognition system. It was shown that RAPT-based adaptation reduces to a linear transformation of cepstral means, much like the better known maximum likelihood linear regression (MLLR). In a set of speech recognition experiments conducted on the Switchboard Corpus, we obtained a word error rate (WER) of 37.9% using RAPT adaptation, a significant improvement over the 39.5% WER achieved with MLLR. In the present work, we propose the sine-log all-pass transform (SLAPT) as a replacement for the RAPT. Our findings indicate the SLAPT is just as effective as the RAPT at reducing WER when used as the basis for a variety of speaker compensation schemes, but in addition conduces to far more tractable computation of transformed cepstral sequences, and the estimation of optimal transform parameters.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Normalization with All-pass Transforms Center for Language and Speech Processing 72 Speaker Normalization with All-pass Transforms

Speaker normalization is a process in which the short-time features of speech from a given speaker are transformed so as to better match some speaker independent model. Vocal tract length normalization (VTLN) is a popular speaker normalization scheme wherein the frequency axis of the short-time spectrum associated with a particular speaker’s speech is rescaled or warped prior to the extraction ...

متن کامل

Speaker Identification using Frequency Dsitribution in the Transform Domain

In this paper, we propose Speaker Identification using the frequency distribution of various transforms like DFT (Discrete Fourier Transform), DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), Hartley, Walsh, Haar and Kekre transforms. The speech signal spoken by a particular speaker is converted into frequency domain by applying the different transform techniques. The distributio...

متن کامل

Speaker normalization with all-pass transforms

Speaker normalization is a process in which the short-time features of speech from a given speaker are transformed so as to better match some speaker independent model. Vocal tract length normalization (VTLN) is a popular speaker normalization scheme wherein the frequency axis of the short-time spectrum associated with a speaker's speech is rescaled or warped prior to the extraction of cepstral...

متن کامل

Speaker adaptation with all-pass transforms

In recent work, a class of transforms were proposed which achieve a remapping of the frequency axis much like conventional vocal tract length normalization. These mappings, known collectively as all-pass transforms (APT), were shown to produce substantial improvements in the performance of a large vocabulary speech recognition system when used to normalize incoming speech prior to recognition. ...

متن کامل

Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

Vocal Tract Length Normalization (VTLN) for standard filterbank-based Mel Frequency Cepstral Coefficient (MFCC) features is usually implemented by warping the center frequencies of the Mel filterbank, and the warping factor is estimated using the maximum likelihood score (MLS) criterion (Lee and Rose, 1998). A linear transform (LT) equivalent for frequency warping (FW) would enable more efficie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001